AITopics | network quantization

Collaborating Authors

network quantization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AdaptiveProximalGradientMethodsforStructured NeuralNetworks

Neural Information Processing SystemsFeb-19-2026, 09:12:23 GMT

The technique of regularization is ubiquitous in machine learning as it can effectively prevent overfitting and yield better generalization.

artificial intelligence, machine learning, regularization, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Improving Quantization-aware Training of Low-Precision Network via Block Replacement on Full-Precision Counterpart

Yu, Chengting, Yang, Shu, Zhang, Fengzhao, Ma, Hanzhi, Wang, Aili, Li, Er-Ping

arXiv.org Artificial IntelligenceDec-20-2024

Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task goals. However, direct training of low-precision networks generally faces two obstacles: 1. The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations, which constitutes a deficiency compared to full-precision alternatives; 2. Non-ideal deviations during gradient propagation are a common consequence of employing pseudo-gradients as approximations in derived quantized functions. In this paper, we propose a general QAT framework for alleviating the aforementioned concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision partner during training. In conjunction with the direct training of the quantization model, intermediate mixed-precision models are generated through the block-by-block replacement on the full-precision model and working simultaneously with the low-precision backbone, which enables the integration of quantized low-precision blocks into full-precision networks throughout the training phase. Consequently, each quantized block is capable of: 1. simulating full-precision representation during forward passes; 2. obtaining gradients with improved estimation during backward passes. We demonstrate that the proposed method achieves state-of-the-art results for 4-, 3-, and 2-bit quantization on ImageNet and CIFAR-10. The proposed framework provides a compatible extension for most QAT methods and only requires a concise wrapper for existing codes.

artificial intelligence, machine learning, quantization, (16 more...)

arXiv.org Artificial Intelligence

2412.15846

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Peters, Jorn, Fournarakis, Marios, Nagel, Markus, van Baalen, Mart, Blankevoort, Tijmen

arXiv.org Artificial IntelligenceJul-10-2023

Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2307.04535

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.75)

Add feedback

BOMP-NAS: Bayesian Optimization Mixed Precision NAS

van Son, David, de Putter, Floran, Vogel, Sebastian, Corporaal, Henk

arXiv.org Artificial IntelligenceJan-27-2023

Bayesian Optimization Mixed-Precision Neural Architecture Search (BOMP-NAS) is an approach to quantization-aware neural architecture search (QA-NAS) that leverages both Bayesian optimization (BO) and mixed-precision quantization (MP) to efficiently search for compact, high performance deep neural networks. The results show that integrating quantization-aware fine-tuning (QAFT) into the NAS loop is a necessary step to find networks that perform well under low-precision quantization: integrating it allows a model size reduction of nearly 50\% on the CIFAR-10 dataset. BOMP-NAS is able to find neural networks that achieve state of the art performance at much lower design costs. This study shows that BOMP-NAS can find these neural networks at a 6x shorter search time compared to the closest related work.

artificial intelligence, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2301.1181

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

OMPQ: Orthogonal Mixed Precision Quantization

Ma, Yuexiao, Jin, Taisong, Zheng, Xiawu, Wang, Yan, Li, Huixia, Wu, Yongjian, Jiang, Guannan, Zhang, Wei, Ji, Rongrong

arXiv.org Artificial IntelligenceNov-23-2022

To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. However, this also results in a difficult integer programming formulation, and forces most existing approaches to use an extremely time-consuming search process even with various relaxations. Instead of solving a problem of the original integer programming, we propose to optimize a proxy metric, the concept of network orthogonality, which is highly correlated with the loss of the integer programming but also easy to optimize with linear programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy. Specifically, we achieve 72.08% Top-1 accuracy on ResNet-18 with 6.7Mb, which does not require any searching iterations. Given the high efficiency and low data dependency of our algorithm, we used it for the post-training quantization, which achieve 71.27% Top-1 accuracy on MobileNetV2 with only 1.5Mb. Our code is available at https://github.com/MAC-AutoML/OMPQ.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2109.07865

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Zero-Shot Learning of a Conditional Generative Adversarial Network for Data-Free Network Quantization

Choi, Yoojin, El-Khamy, Mostafa, Lee, Jungwon

arXiv.org Artificial IntelligenceOct-25-2022

We propose a novel method for training a conditional generative adversarial network (CGAN) without the use of training data, called zero-shot learning of a CGAN (ZS-CGAN). Zero-shot learning of a conditional generator only needs a pre-trained discriminative (classification) model and does not need any training data. In particular, the conditional generator is trained to produce labeled synthetic samples whose characteristics mimic the original training data by using the statistics stored in the batch normalization layers of the pre-trained model. We show the usefulness of ZS-CGAN in data-free quantization of deep neural networks. We achieved the state-of-the-art data-free network quantization of the ResNet and MobileNet classification models trained on the ImageNet dataset. Data-free quantization using ZS-CGAN showed a minimal loss in accuracy compared to that obtained by conventional data-dependent quantization.

artificial intelligence, machine learning, quantization, (17 more...)

arXiv.org Artificial Intelligence

2210.14392

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Promising Solution (0.48)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Kernel Quantization for Efficient Network Compression

Yu, Zhongzhi, Shi, Yemin, Huang, Tiejun, Yu, Yizhou

arXiv.org Machine LearningMar-11-2020

This paper presents a novel network compression framework Kernel Quantization (KQ), targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version without significant performance loss. Unlike existing methods struggling with weight bit-length, KQ has the potential in improving the compression ratio by considering the convolution kernel as the quantization unit. Inspired by the evolution from weight pruning to filter pruning, we propose to quantize in both kernel and weight level. Instead of representing each weight parameter with a low-bit index, we learn a kernel codebook and replace all kernels in the convolution layer with corresponding low-bit indexes. Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio. Then, we conduct a 6-bit parameter quantization on the kernel codebook to further reduce redundancy. Extensive experiments on the ImageNet classification task prove that KQ needs 1.05 and 1.62 bits on average in VGG and ResNet18, respectively, to represent each parameter in the convolution layer and achieves the state-of-the-art compression ratio with little accuracy loss.

codebook, compression ratio, quantization, (15 more...)

arXiv.org Machine Learning

2003.05148

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

8 Neural Network Compression Techniques For ML Developers

#artificialintelligenceNov-29-2019, 09:14:04 GMT

In addition, recent years witnessed significant progress in virtual reality, augmented reality, and smart wearable devices, creating challenges in deploying deep learning systems to portable devices with limited resources (e.g. Now let's take a look at a few papers that introduced novel compression models: In this paper, the authors propose two novel network quantization approaches single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ). The network quantization is considered from both width and depth level. In this paper the authors proposed an efficient method for obtaining the rank configuration of the whole network. Unlike previous methods which consider each layer separately, this method considers the whole network to choose the right rank configuration. It combines three techniques -- value quantization with sparsity multiplication, base encoding, and zero-run encoding.

network quantization, neural network compression technique, quantization, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

AutoQB: AutoML for Network Quantization and Binarization on Mobile Devices

Lou, Qian, Liu, Lantao, Kim, Minje, Jiang, Lei

arXiv.org Machine LearningFeb-15-2019

In this paper, we propose a hierarchical deep reinforcement learning (DRL)-based AutoML framework, AutoQB, to automatically explore the design space of channel-level network quantization and binarization for hardware-friendly deep learning on mobile devices. Compared to prior DDPG-based quantization techniques, on the various CNN models, AutoQB automatically achieves the same inference accuracy by $\sim79\%$ less computing overhead, or improves the inference accuracy by $\sim2\%$ with the same computing cost.

accuracy, autoqb, inference accuracy, (14 more...)

arXiv.org Machine Learning

1902.0569

Country: North America > United States > Indiana (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Fast Adjustable Threshold For Uniform Neural Network Quantization

Goncharenko, Alexander, Denisov, Andrey, Alyamkin, Sergey, Terentev, Evgeny

arXiv.org Machine LearningDec-19-2018

Neural network quantization procedure is the necessary step for porting of neural networks to mobile devices. Quantization allows accelerating the inference, reducing memory consumption and model size. It can be performed without fine-tuning using calibration procedure (calculation of parameters necessary for quantization), or it is possible to train the network with quantization from scratch. Training with quantization from scratch on the labeled data is rather long and resource-consuming procedure. Quantization of network without fine-tuning leads to accuracy drop because of outliers which appear during the calibration. In this article we suggest to simplify the quantization procedure significantly by introducing the trained scale factors for quantization thresholds. It allows speeding up the process of quantization with fine-tuning up to 8 epochs as well as reducing the requirements to the set of train images. By our knowledge, the proposed method allowed us to get the first public available quantized version of MNAS without significant accuracy reduction - 74.8% vs 75.3% for original full-precision network.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

1812.07872

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback